Locating Text in Historical Collection Manuscripts
نویسندگان
چکیده
It is common that documents belonging to historical collections are poorly preserved and are prone to degradation processes. The aim of this work is to leverage state-of-the-art techniques in digital image binarization and text identification for digitized documents allowing further content exploitation in an efficient way. A novel methodology is proposed that leads to preservation of meaningful textual information in low quality historical documents. The method has been developed in the framework of the Hellenic GSRT-funded R&D project, D-SCRIBE, which aims at developing an integrated system for digitization and processing of old Hellenic manuscripts. After testing of the proposed method on numerous low quality historical manuscripts, it has turned out that our methodology performs better compared to current state-of-the-art adaptive thresholding techniques.
منابع مشابه
Margins are more important than text, Historical values of margins, memorial notes and colophons of Manuscripts in Zoroastrian tradition
In the Zoroastrian tradition, the most important challenge and the most ambiguous issue is ambiguity in history and neglect of time and chronology. Perhaps, this approach that historical time is limit and the begging and end of time is clear and the goodness will be conqueror eventually; it is because of ambiguity of history in Zoroastrian tradition.since early time to now, the Zoroastrian re...
متن کاملWithin-text and Out-of-text Structures of Islamic-Iranian Manuscripts
Despite some differences, Islamic-Iranianmanuscriptshave special common out-of-text and within-text structures. These structures were followed by authors, scribes and transcription centers during centuries when transcription tradition was dominant throughout the Islamic world. In this article, these common features were considered in detail. Some manuscript folios preserved in National Library ...
متن کاملGuideline: Multiple Hierarchies
As the title of the Dagstuhl Seminar Digital Historical Corpora Architecture, Annotation, and Retrieval already suggests, corpus architecture and corpus annotation is an important topic for representing (historical) texts. Especially the limitation of SGML-based markup languages to tree structured annotations raises a special problems when dealing with manuscripts: How is it possible to represe...
متن کاملIranian Scholars’ Revision of Their Submitted Manuscripts: Signaling Impersonality in Text
Nonnative English-speaking scholars have often been reported to be at a disadvantage vis-à-vis their English native counterparts when it comes to writing a publishable research article (RA). When they submit their manuscripts to English-language journals, they sometimes receive comments criticizing their faulty English. One area of difficulty for these authors is the grammaticalization of neutr...
متن کاملCorrigendum: Virtual unrolling and deciphering of Herculaneum papyri by X-ray phase-contrast tomography
A collection of more than 1800 carbonized papyri, discovered in the Roman 'Villa dei Papiri' at Herculaneum is the unique classical library survived from antiquity. These papyri were charred during 79 A.D. Vesuvius eruption, a circumstance which providentially preserved them until now. This magnificent collection contains an impressive amount of treatises by Greek philosophers and, especially, ...
متن کامل